76 research outputs found

    A neural network based model effectively predicts enhancers from clinical ATAC-seq samples.

    Get PDF
    Enhancers are cis-acting sequences that regulate transcription rates of their target genes in a cell-specific manner and harbor disease-associated sequence variants in cognate cell types. Many complex diseases are associated with enhancer malfunction, necessitating the discovery and study of enhancers from clinical samples. Assay for Transposase Accessible Chromatin (ATAC-seq) technology can interrogate chromatin accessibility from small cell numbers and facilitate studying enhancers in pathologies. However, on average, ~35% of open chromatin regions (OCRs) from ATAC-seq samples map to enhancers. We developed a neural network-based model, Predicting Enhancers from ATAC-Seq data (PEAS), to effectively infer enhancers from clinical ATAC-seq samples by extracting ATAC-seq data features and integrating these with sequence-related features (e.g., GC ratio). PEAS recapitulated ChromHMM-defined enhancers in CD14+ monocytes, CD4+ T cells, GM12878, peripheral blood mononuclear cells, and pancreatic islets. PEAS models trained on these 5 cell types effectively predicted enhancers in four cell types that are not used in model training (EndoC-βH1, naïve CD8+ T, MCF7, and K562 cells). Finally, PEAS inferred individual-specific enhancers from 19 islet ATAC-seq samples and revealed variability in enhancer activity across individuals, including those driven by genetic differences. PEAS is an easy-to-use tool developed to study enhancers in pathologies by taking advantage of the increasing number of clinical epigenomes

    BiFET: sequencing Bias-free transcription factor Footprint Enrichment Test.

    Get PDF
    Transcription factor (TF) footprinting uncovers putative protein-DNA binding via combined analyses of chromatin accessibility patterns and their underlying TF sequence motifs. TF footprints are frequently used to identify TFs that regulate activities of cell/condition-specific genomic regions (target loci) in comparison to control regions (background loci) using standard enrichment tests. However, there is a strong association between the chromatin accessibility level and the GC content of a locus and the number and types of TF footprints that can be detected at this site. Traditional enrichment tests (e.g. hypergeometric) do not account for this bias and inflate false positive associations. Therefore, we developed a novel post-processing method, Bias-free Footprint Enrichment Test (BiFET), that corrects for the biases arising from the differences in chromatin accessibility levels and GC contents between target and background loci in footprint enrichment analyses. We applied BiFET on TF footprint calls obtained from EndoC-βH1 ATAC-seq samples using three different algorithms (CENTIPEDE, HINT-BC and PIQ) and showed BiFET\u27s ability to increase power and reduce false positive rate when compared to hypergeometric test. Furthermore, we used BiFET to study TF footprints from human PBMC and pancreatic islet ATAC-seq samples to show its utility to identify putative TFs associated with cell-type-specific loci

    A new graph-based clustering method with application to single-cell RNA-seq data from human pancreatic islets.

    Get PDF
    Traditional bulk RNA-sequencing of human pancreatic islets mainly reflects transcriptional response of major cell types. Single-cell RNA sequencing technology enables transcriptional characterization of individual cells, and thus makes it possible to detect cell types and subtypes. To tackle the heterogeneity of single-cell RNA-seq data, powerful and appropriate clustering is required to facilitate the discovery of cell types. In this paper, we propose a new clustering framework based on a graph-based model with various types of dissimilarity measures. We take the compositional nature of single-cell RNA-seq data into account and employ log-ratio transformations. The practical merit of the proposed method is demonstrated through the application to the centered log-ratio-transformed single-cell RNA-seq data for human pancreatic islets. The practical merit is also demonstrated through comparisons with existing single-cell clustering methods. The R-package for the proposed method can be found at https://github.com/Zhang-Data-Science-Research-Lab/LrSClust

    CoRE-ATAC: A deep learning model for the functional classification of regulatory elements from single cell and bulk ATAC-seq data.

    Get PDF
    Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases

    Functional characterization of T2D-associated SNP effects on baseline and ER stress-responsive β cell transcriptional activation.

    Get PDF
    Genome-wide association studies (GWAS) have linked single nucleotide polymorphisms (SNPs) at \u3e250 loci in the human genome to type 2 diabetes (T2D) risk. For each locus, identifying the functional variant(s) among multiple SNPs in high linkage disequilibrium is critical to understand molecular mechanisms underlying T2D genetic risk. Using massively parallel reporter assays (MPRA), we test the cis-regulatory effects of SNPs associated with T2D and altered in vivo islet chromatin accessibility in MIN6 β cells under steady state and pathophysiologic endoplasmic reticulum (ER) stress conditions. We identify 1,982/6,621 (29.9%) SNP-containing elements that activate transcription in MIN6 and 879 SNP alleles that modulate MPRA activity. Multiple T2D-associated SNPs alter the activity of short interspersed nuclear element (SINE)-containing elements that are strongly induced by ER stress. We identify 220 functional variants at 104 T2D association signals, narrowing 54 signals to a single candidate SNP. Together, this study identifies elements driving β cell steady state and ER stress-responsive transcriptional activation, nominates causal T2D SNPs, and uncovers potential roles for repetitive elements in β cell transcriptional stress response and T2D genetics

    Single Cell Analysis of Blood Mononuclear Cells Stimulated Through Either LPS or Anti-CD3 and Anti-CD28.

    Get PDF
    Immune cell activation assays have been widely used for immune monitoring and for understanding disease mechanisms. However, these assays are typically limited in scope. A holistic study of circulating immune cell responses to different activators is lacking. Here we developed a cost-effective high-throughput multiplexed single-cell RNA-seq combined with epitope tagging (CITE-seq) to determine how classic activators of T cells (anti-CD3 coupled with anti-CD28) or monocytes (LPS) alter the cell composition and transcriptional profiles of peripheral blood mononuclear cells (PBMCs) from healthy human donors. Anti-CD3/CD28 treatment activated all classes of lymphocytes either directly (T cells) or indirectly (B and NK cells) but reduced monocyte numbers. Activated T and NK cells expressed senescence and effector molecules, whereas activated B cells transcriptionally resembled autoimmune disease- or age-associated B cells (e.g., CD11c, T-bet). In contrast, LPS specifically targeted monocytes and induced two main states: early activation characterized by the expression of chemoattractants and a later pro-inflammatory state characterized by expression of effector molecules. These data provide a foundation for future immune activation studies with single cell technologies (https://czi-pbmc-cite-seq.jax.org/)

    Tet2 Controls the Responses of β cells to Inflammation in Autoimmune Diabetes.

    Get PDF
    β cells may participate and contribute to their own demise during Type 1 diabetes (T1D). Here we report a role of their expression of Tet2 in regulating immune killing. Tet2 is induced in murine and human β cells with inflammation but its expression is reduced in surviving β cells. Tet2-KO mice that receive WT bone marrow transplants develop insulitis but not diabetes and islet infiltrates do not eliminate β cells even though immune cells from the mice can transfer diabetes to NOD/scid recipients. Tet2-KO recipients are protected from transfer of disease by diabetogenic immune cells.Tet2-KO β cells show reduced expression of IFNγ-induced inflammatory genes that are needed to activate diabetogenic T cells. Here we show that Tet2 regulates pathologic interactions between β cells and immune cells and controls damaging inflammatory pathways. Our data suggests that eliminating TET2 in β cells may reduce activating pathologic immune cells and killing of β cells

    Modeling islet enhancers using deep learning identifies candidate causal variants at loci associated with T2D and glycemic traits.

    Get PDF
    Genetic association studies have identified hundreds of independent signals associated with type 2 diabetes (T2D) and related traits. Despite these successes, the identification of specific causal variants underlying a genetic association signal remains challenging. In this study, we describe a deep learning (DL) method to analyze the impact of sequence variants on enhancers. Focusing on pancreatic islets, a T2D relevant tissue, we show that our model learns islet-specific transcription factor (TF) regulatory patterns and can be used to prioritize candidate causal variants. At 101 genetic signals associated with T2D and related glycemic traits where multiple variants occur in linkage disequilibrium, our method nominates a single causal variant for each association signal, including three variants previously shown to alter reporter activity in islet-relevant cell types. For another signal associated with blood glucose levels, we biochemically test all candidate causal variants from statistical fine-mapping using a pancreatic islet beta cell line and show biochemical evidence of allelic effects on TF binding for the model-prioritized variant. To aid in future research, we publicly distribute our model and islet enhancer perturbation scores across ~67 million genetic variants. We anticipate that DL methods like the one presented in this study will enhance the prioritization of candidate causal variants for functional studies

    Variation in histone configurations correlates with gene expression across nine inbred strains of mice.

    Get PDF
    The diversity outbred (DO) mice and their inbred founders are widely used models of human disease. However, although the genetic diversity of these mice has been well documented, their epigenetic diversity has not. Epigenetic modifications, such as histone modifications and DNA methylation, are important regulators of gene expression, and as such are a critical mechanistic link between genotype and phenotype. Therefore, creating a map of epigenetic modifications in the DO mice and their founders is an important step toward understanding mechanisms of gene regulation and the link to disease in this widely used resource. To this end, we performed a strain survey of epigenetic modifications in hepatocytes of the DO founders. We surveyed four histone modifications (H3K4me1, H3K4me3, H3K27me3, and H3K27ac), and DNA methylation. We used ChromHMM to identify 14 chromatin states, each of which represented a distinct combination of the four histone modifications. We found that the epigenetic landscape was highly variable across the DO founders and was associated with variation in gene expression across strains. We found that epigenetic state imputed into a population of DO mice recapitulated the association with gene expression seen in the founders suggesting that both histone modifications and DNA methylation are highly heritable mechanisms of gene expression regulation. We illustrate how DO gene expression can be aligned with inbred epigenetic states to identify putative cis-regulatory regions. Finally, we provide a data resource that documents strain-specific variation in chromatin state and DNA methylation in hepatocytes across nine widely used strains of laboratory mice

    AMULET: a novel read count-based method for effective multiplet detection from single nucleus ATAC-seq data.

    Get PDF
    Detecting multiplets in single nucleus (sn)ATAC-seq data is challenging due to data sparsity and limited dynamic range. AMULET (ATAC-seq MULtiplet Estimation Tool) enumerates regions with greater than two uniquely aligned reads across the genome to effectively detect multiplets. We evaluate the method by generating snATAC-seq data in the human blood and pancreatic islet samples. AMULET has high precision, estimated via donor-based multiplexing, and high recall, estimated via simulated multiplets, compared to alternatives and identifies multiplets most effectively when a certain read depth of 25K median valid reads per nucleus is achieved
    • …